feat(helm): Add S3 storage support for archives, streams, and log inputs.#1825
feat(helm): Add S3 storage support for archives, streams, and log inputs.#1825junhaoliao merged 14 commits intoy-scope:mainfrom
Conversation
|
Warning Rate limit exceeded@junhaoliao has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 8 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughConditionalizes volume/volumeMount provisioning and adds S3-backed staging paths and optional AWS config directory support across Helm templates; introduces two AWS config helper templates and increments the chart version. ConfigMap values expanded to describe fs and s3 storage options and AWS authentication. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
# Conflicts: # tools/deployment/package-helm/Chart.yaml # tools/deployment/package-helm/templates/api-server-deployment.yaml # tools/deployment/package-helm/templates/configmap.yaml # tools/deployment/package-helm/templates/garbage-collector-deployment.yaml # tools/deployment/package-helm/templates/garbage-collector-logs-pv.yaml # tools/deployment/package-helm/templates/garbage-collector-logs-pvc.yaml # tools/deployment/package-helm/test.sh
…garbage collector YAML files
# Conflicts: # tools/deployment/package-helm/Chart.yaml # tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml # tools/deployment/package-helm/templates/configmap.yaml # tools/deployment/package-helm/templates/garbage-collector-deployment.yaml # tools/deployment/package-helm/templates/query-worker-deployment.yaml # tools/deployment/package-helm/templates/webui-deployment.yaml
There was a problem hiding this comment.
Actionable comments posted: 7
Fix all issues with AI Agents 🤖
In @tools/deployment/package-helm/templates/_helpers.tpl:
- Around line 250-260: The helper "clp.awsConfigVolumeMount" assumes
.Values.clpConfig.aws_config_directory is set and will render an invalid mount
if it is null; update the helper's docstring to state the precondition that
.Values.clpConfig.aws_config_directory must be non-null (e.g., "Requires
.Values.clpConfig.aws_config_directory to be set/non-empty; callers must guard
with an if check"), and optionally mention that callers should wrap the helper
with {{- if .Values.clpConfig.aws_config_directory }} ... {{- end }} to avoid
misuse.
- Around line 268-273: The Helm template define "clp.awsConfigVolume" uses a
hostPath with type "Directory" sourced from
.Values.clpConfig.aws_config_directory; update the deployment docs and add a
comment in values.yaml next to clpConfig.aws_config_directory to explicitly
state that when this value is set, the specified directory must already exist on
every target node (or be created during pre-deployment) because Kubernetes will
fail to mount a hostPath Directory that does not exist. Ensure the note mentions
potential alternatives (e.g., using a different volume type) and the exact
values key (.Values.clpConfig.aws_config_directory) so users can find and
configure it easily.
In @tools/deployment/package-helm/templates/configmap.yaml:
- Around line 286-295: The template accesses .aws_authentication.type without
ensuring .aws_authentication exists; change the conditional to guard for nil,
e.g. use an AND: replace {{- if eq .aws_authentication.type "profile" }} with
{{- if and .aws_authentication (eq .aws_authentication.type "profile") }} so
StreamFilesS3Profile is only read when aws_authentication is present, keeping
the else branch that emits null.
- Line 288: The template currently renders "StreamFilesS3Region" as a JSON
string "null" because {{ .region_code | default "null" | quote }} wraps the
default in quotes; change the template to emit a real JSON null when
.region_code is unset by using a conditional: replace the expression with {{ if
.region_code }}{{ .region_code | quote }}{{ else }}null{{ end }} so
"StreamFilesS3Region" gets either a quoted region string or an unquoted null.
In @tools/deployment/package-helm/templates/garbage-collector-deployment.yaml:
- Around line 73-74: The tmp volume mount incorrectly uses mountPath "/var/log";
update the container volumeMount with name "tmp" to use mountPath "/var/tmp"
(matching other components like compression-worker) to avoid conflicts with the
existing logs mount at "/var/log/garbage_collector".
In @tools/deployment/package-helm/templates/query-worker-staged-streams-pv.yaml:
- Around line 1-11: The values referenced by the PV templates
(.Values.clpConfig.stream_output.storage.staging_directory and
.Values.clpConfig.archive_output.storage.staging_directory) are missing from
values.yaml causing null paths and invalid PersistentVolume manifests; add these
two keys to values.yaml (or expose them as configurable entries) with sensible
defaults such as "/var/data/staged-streams" for
clpConfig.stream_output.storage.staging_directory and
"/var/data/staged-archives" for
clpConfig.archive_output.storage.staging_directory so the Helm templates that
call include "clp.createLocalPv" receive non-null staging_directory values and
produce valid PV specs.
In @tools/deployment/package-helm/templates/webui-deployment.yaml:
- Around line 62-70: The template injects AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY directly from .Values.clpConfig.stream_output.storage,
which exposes credentials in plain text; instead, move AWS creds into a
Kubernetes Secret (e.g., a secret named via the same naming helper used
elsewhere like include "clp.fullname") and change the deployment env entries for
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use valueFrom.secretKeyRef
referencing that Secret (follow the same pattern used for the DB credentials
block and reference secret keys such as access-key-id and secret-access-key);
remove the direct usage of .s3_config.aws_authentication.credentials.* in the
deployment template and ensure values.yaml contains a credentials.aws section
intended only for secure injection (or require users to supply the Secret
externally).
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (21)
tools/deployment/package-helm/Chart.yamltools/deployment/package-helm/templates/_helpers.tpltools/deployment/package-helm/templates/api-server-deployment.yamltools/deployment/package-helm/templates/compression-scheduler-deployment.yamltools/deployment/package-helm/templates/compression-worker-deployment.yamltools/deployment/package-helm/templates/compression-worker-staged-archives-pv.yamltools/deployment/package-helm/templates/compression-worker-staged-archives-pvc.yamltools/deployment/package-helm/templates/configmap.yamltools/deployment/package-helm/templates/garbage-collector-deployment.yamltools/deployment/package-helm/templates/garbage-collector-logs-pv.yamltools/deployment/package-helm/templates/garbage-collector-logs-pvc.yamltools/deployment/package-helm/templates/query-worker-deployment.yamltools/deployment/package-helm/templates/query-worker-staged-streams-pv.yamltools/deployment/package-helm/templates/query-worker-staged-streams-pvc.yamltools/deployment/package-helm/templates/shared-data-archives-pv.yamltools/deployment/package-helm/templates/shared-data-archives-pvc.yamltools/deployment/package-helm/templates/shared-data-streams-pv.yamltools/deployment/package-helm/templates/shared-data-streams-pvc.yamltools/deployment/package-helm/templates/webui-deployment.yamltools/deployment/package-helm/test.shtools/deployment/package-helm/values.yaml
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
📚 Learning: 2025-05-06T10:07:04.654Z
Learnt from: kirkrodrigues
Repo: y-scope/clp PR: 881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
Applied to files:
tools/deployment/package-helm/test.sh
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
tools/deployment/package-helm/templates/webui-deployment.yamltools/deployment/package-helm/templates/shared-data-streams-pvc.yamltools/deployment/package-helm/templates/compression-worker-deployment.yamltools/deployment/package-helm/templates/query-worker-staged-streams-pv.yamltools/deployment/package-helm/templates/api-server-deployment.yamltools/deployment/package-helm/templates/query-worker-staged-streams-pvc.yamltools/deployment/package-helm/templates/query-worker-deployment.yaml
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
tools/deployment/package-helm/templates/compression-worker-staged-archives-pvc.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-11-28T15:12:53.530Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1655
File: components/clp-package-utils/clp_package_utils/controller.py:183-189
Timestamp: 2025-11-28T15:12:53.530Z
Learning: In the y-scope/clp project, root database credentials (root username and password) are required configuration items for all deployments. The `credentials[ClpDbUserType.ROOT]` entry is guaranteed to exist and can be accessed directly without optional handling.
Applied to files:
tools/deployment/package-helm/values.yaml
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
tools/deployment/package-helm/templates/query-worker-staged-streams-pv.yamltools/deployment/package-helm/templates/query-worker-staged-streams-pvc.yaml
📚 Learning: 2025-12-19T05:03:42.629Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1818
File: tools/deployment/package-helm/templates/configmap.yaml:12-12
Timestamp: 2025-12-19T05:03:42.629Z
Learning: In the y-scope/clp Helm chart, the `api_server.host` configuration field in the ConfigMap (tools/deployment/package-helm/templates/configmap.yaml) is unused because no other k8s internal services need to reach the API server—it's only accessed from outside the cluster via NodePort.
Applied to files:
tools/deployment/package-helm/templates/api-server-deployment.yaml
🪛 YAMLlint (1.37.1)
tools/deployment/package-helm/templates/garbage-collector-logs-pv.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/shared-data-archives-pvc.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 8-8: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 11-11: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/shared-data-streams-pvc.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 8-8: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-worker-staged-archives-pvc.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 8-8: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/garbage-collector-logs-pvc.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/shared-data-streams-pv.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 11-11: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/garbage-collector-deployment.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
tools/deployment/package-helm/templates/query-worker-staged-streams-pv.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 10-10: too many spaces inside braces
(braces)
[error] 11-11: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/query-worker-staged-streams-pvc.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 8-8: too many spaces inside braces
(braces)
[error] 9-9: too many spaces inside braces
(braces)
tools/deployment/package-helm/templates/compression-worker-staged-archives-pv.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 10-10: too many spaces inside braces
(braces)
[error] 11-11: too many spaces inside braces
(braces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: package-image
🔇 Additional comments (26)
tools/deployment/package-helm/Chart.yaml (1)
3-3: Version bump is appropriate for the feature release.The chart version increment from 0.1.2-dev.14 to 0.1.2-dev.15 correctly reflects the addition of S3 storage support and associated template/configuration enhancements across the deployment manifests.
tools/deployment/package-helm/test.sh (1)
58-59: LGTM! Directory structure supports S3 staging.The addition of
staged-archivesandstaged-streamsdirectories correctly aligns with the PR's S3 storage support feature, where these staging areas temporarily hold data before upload to S3.tools/deployment/package-helm/templates/shared-data-archives-pvc.yaml (1)
1-9: LGTM! Conditional PVC creation is correctly implemented.The conditional guard appropriately creates the shared archives PVC only when using filesystem storage. This aligns with the broader pattern of conditionally mounting volumes based on storage type.
tools/deployment/package-helm/templates/shared-data-archives-pv.yaml (1)
1-11: LGTM! PV conditional aligns with corresponding PVC.The PV creation is correctly gated on filesystem storage type, ensuring PV and PVC are created together. The
hostPathbinding from configuration is appropriate for local storage.tools/deployment/package-helm/templates/query-worker-staged-streams-pvc.yaml (1)
1-9: LGTM! Staging PVC correctly scoped for S3 storage.The conditional creation of the staging PVC when using S3 storage is appropriate, as streams need to be staged locally before upload. The
ReadWriteOnceaccess mode is correct for single-writer staging scenarios.Verify that the 50Gi capacity is sufficient for your expected staging workload. Consider making this configurable via
values.yamlif different deployments have varying staging requirements.Based on learnings, streams can be deleted if S3 upload fails, so staging volume sizing is primarily a performance consideration rather than a data durability concern.
tools/deployment/package-helm/templates/garbage-collector-logs-pv.yaml (1)
1-3: LGTM! Formatting improvement enhances readability.The multi-line conditional format improves readability without changing the logic. This aligns with the formatting style used in other templates in this PR.
tools/deployment/package-helm/templates/shared-data-streams-pv.yaml (1)
1-11: LGTM! Streams PV conditional follows established pattern.The conditional guard for filesystem storage is consistent with the streams PVC and archives PV approach. Both the PV and PVC use the matching conditional guard (
eq .Values.clpConfig.stream_output.storage.type "fs"), and the 20Gi capacity with ReadWriteMany access mode is appropriate for streams, ensuring PV and PVC creation remain synchronized.tools/deployment/package-helm/templates/shared-data-streams-pvc.yaml (1)
1-9: LGTM! Conditional PVC creation is correct.The conditional logic appropriately creates the shared-data-streams PVC only when filesystem storage is configured. When S3 storage is used, staged volumes are created instead, making this shared PVC unnecessary.
tools/deployment/package-helm/templates/garbage-collector-logs-pvc.yaml (1)
1-3: LGTM! Formatting improvement enhances readability.The multi-line conditional format is clearer and consistent with the style used in other templates within this PR.
tools/deployment/package-helm/templates/compression-worker-staged-archives-pv.yaml (1)
1-11: Ensure staging_directory validation for S3 storage.The PV creation follows the correct pattern. Similar to the staged-streams PV, verify that
.Values.clpConfig.archive_output.storage.staging_directoryis validated as required whenstorage.typeis "s3", and ensure the directory exists on worker nodes.Additionally, consider if the 50Gi capacity for both staged PVs (archives and streams) should be independently configurable, as ingestion patterns might differ between archives and streams.
tools/deployment/package-helm/templates/compression-worker-staged-archives-pvc.yaml (1)
1-9: Consider making staging volume capacity configurable.The PVC logic is correct for S3 storage. However, the 50Gi capacity for the staged-archives volume is hardcoded. Depending on ingestion volume and network speed to S3, users might need to adjust this capacity.
Consider adding a configurable value such as
.Values.clpConfig.archive_output.storage.staging_capacityto allow users to tune this based on their workload characteristics.tools/deployment/package-helm/values.yaml (1)
156-158: LGTM - Clean configuration addition.The new
aws_config_directoryconfiguration is well-documented and defaults to null (disabled), providing safe opt-in behavior for AWS config mounting.tools/deployment/package-helm/templates/compression-scheduler-deployment.yaml (1)
83-88: LGTM - Conditional volume mounting implemented correctly.The conditional mounting logic properly guards helper invocations and aligns with the storage-type-based design. AWS config is mounted only when configured, and logs input mount is added only for filesystem-based logs.
Also applies to: 108-113
tools/deployment/package-helm/templates/api-server-deployment.yaml (1)
70-76: LGTM - Streams mount correctly conditionalized.The streams volume mount is now properly conditional on
stream_output.storage.type == "fs", which aligns with the S3 storage support goals. When using S3 storage, the api-server won't need local access to streams.Also applies to: 100-106
tools/deployment/package-helm/templates/webui-deployment.yaml (1)
83-95: LGTM - Conditional volume mounting implemented correctly.The conditional mounting logic for AWS config, logs input, and streams is properly implemented with appropriate guards. This correctly enables the webui to work with both filesystem and S3 storage backends.
Also applies to: 114-126
tools/deployment/package-helm/templates/compression-worker-deployment.yaml (2)
62-80: LGTM! Conditional volume mounts correctly handle fs vs s3 storage types.The conditional logic properly handles:
- Archives mounted from shared-data PVC for
fstype- Staged-archives mounted for
s3type (for local staging before S3 upload)- AWS config directory mounted when configured
- Logs input volume mounted only for
fstype
103-124: LGTM! Volume definitions align with their corresponding mounts.The volumes section correctly mirrors the volumeMounts conditionals, ensuring that volumes are only created when their corresponding mounts are enabled.
tools/deployment/package-helm/templates/garbage-collector-deployment.yaml (3)
1-3: YAMLlint false positive - Helm templating syntax is correct.The static analysis error is a false positive. YAMLlint doesn't understand Helm templating, and
{{- if or ... }}is valid Helm syntax for conditional deployment based on retention period settings.
75-91: LGTM! Conditional volume mounts for archives, streams, and AWS config.The garbage-collector correctly conditionally mounts:
- Archives volume only when
archive_output.storage.typeisfs- Streams volume only when
stream_output.storage.typeisfs- AWS config when
aws_config_directoryis setThis aligns with the S3 storage support feature where the garbage-collector would access S3 directly rather than through mounted volumes.
103-124: LGTM! Volume definitions match the conditional mount logic.tools/deployment/package-helm/templates/query-worker-deployment.yaml (2)
53-79: LGTM! Volume mounts correctly handle fs vs s3 storage types.The query-worker appropriately:
- Mounts config as read-only
- Conditionally mounts archives for
fsstorage type- Mounts streams for
fsor staged-streams fors3storage typeBased on learnings, the streams/staged-streams directories correctly remain read-write (no
:roflag) since the query-worker writes stream data.
97-122: LGTM! Volume definitions properly align with mount conditions.tools/deployment/package-helm/templates/configmap.yaml (4)
22-57: LGTM! S3 configuration for archive_output storage is well-structured.The configuration correctly:
- Supports both
fsands3storage types- Makes
endpoint_urlandregion_codeoptional (aligned with learnings about custom S3-compatible endpoints)- Properly nests
aws_authenticationwith support forprofileandcredentialstypes
93-116: LGTM! logs_input S3 configuration follows established pattern.
152-191: LGTM! stream_output storage S3 configuration mirrors archive_output pattern.Good consistency in the configuration structure across archive_output and stream_output.
193-195: LGTM! Top-level aws_config_directory is correctly conditional.
| {{/* | ||
| Creates a volumeMount for the AWS config directory. | ||
|
|
||
| @param {object} . Root template context | ||
| @return {string} YAML-formatted volumeMount definition | ||
| */}} | ||
| {{- define "clp.awsConfigVolumeMount" -}} | ||
| name: "aws-config" | ||
| mountPath: {{ .Values.clpConfig.aws_config_directory | quote }} | ||
| readOnly: true | ||
| {{- end }} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Document the precondition that aws_config_directory must be non-null.
The helper assumes aws_config_directory is set. While all current callers guard with {{- if .Values.clpConfig.aws_config_directory }}, documenting this precondition in the helper's docstring would prevent future misuse.
📝 Suggested documentation enhancement
{{/*
Creates a volumeMount for the AWS config directory.
+@pre {string} .Values.clpConfig.aws_config_directory must be non-null
@param {object} . Root template context
@return {string} YAML-formatted volumeMount definition
*/}}🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/_helpers.tpl around lines 250-260,
The helper "clp.awsConfigVolumeMount" assumes
.Values.clpConfig.aws_config_directory is set and will render an invalid mount
if it is null; update the helper's docstring to state the precondition that
.Values.clpConfig.aws_config_directory must be non-null (e.g., "Requires
.Values.clpConfig.aws_config_directory to be set/non-empty; callers must guard
with an if check"), and optionally mention that callers should wrap the helper
with {{- if .Values.clpConfig.aws_config_directory }} ... {{- end }} to avoid
misuse.
| {{- define "clp.awsConfigVolume" -}} | ||
| name: "aws-config" | ||
| hostPath: | ||
| path: {{ .Values.clpConfig.aws_config_directory | quote }} | ||
| type: "Directory" | ||
| {{- end }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for setup scripts or documentation mentioning AWS config directory
rg -i "aws.?config.?dir" -g "*.md" -g "*.sh" -g "*.yaml" -C3Repository: y-scope/clp
Length of output: 16202
🏁 Script executed:
#!/bin/bash
# Search for deployment guides and README files with setup instructions
find . -type f -name "README*" -o -name "INSTALL*" -o -name "DEPLOY*" | head -20
# Also search for helm chart documentation
find tools/deployment/package-helm -type f \( -name "*.md" -o -name "README*" \)Repository: y-scope/clp
Length of output: 494
🏁 Script executed:
#!/bin/bash
# Search for Kubernetes/Helm deployment documentation
find docs -type f -name "*.md" | xargs rg -l "kubernetes\|helm\|deploy" -i
# Check if there's documentation in the helm chart directory
ls -la tools/deployment/package-helm/
# Look for prerequisites or setup documentation
rg "prerequisite\|setup\|initialize\|create.*director" docs -i -A2 | head -50Repository: y-scope/clp
Length of output: 464
🏁 Script executed:
#!/bin/bash
# Check values.yaml for aws_config_directory documentation
sed -n '1,100p' tools/deployment/package-helm/values.yaml | grep -A5 -B2 "aws_config"
# Search for deployment guides and prerequisites documentation
find docs -type f -name "*.md" | head -20Repository: y-scope/clp
Length of output: 976
🏁 Script executed:
#!/bin/bash
# Find deployment/installation guides
find docs/src/user-docs -type f -name "*.md" | sort
# Also check for any prerequisite documentation
rg "hostPath\|directory.*exist\|prerequisite" docs -i -B2 -A2Repository: y-scope/clp
Length of output: 1602
🏁 Script executed:
#!/bin/bash
# Check the object storage config documentation for aws_config_directory setup
cat docs/src/user-docs/guides-using-object-storage/clp-config.md
# Also check if there's any helm/kubernetes specific deployment guide
find docs -type f -name "*.md" | xargs grep -l "helm\|kubernetes" -iRepository: y-scope/clp
Length of output: 5960
🏁 Script executed:
#!/bin/bash
# Check if there's any helm-specific deployment documentation
find docs -type f -name "*.md" -path "*/user-docs/*" | xargs grep -l "kubernetes\|helm\|deploy" -i
# Check the quick-start guides which might have setup instructions
cat docs/src/user-docs/quick-start/index.mdRepository: y-scope/clp
Length of output: 4023
🏁 Script executed:
#!/bin/bash
# Check multi-host deployment guide for setup instructions
cat docs/src/user-docs/guides-multi-host.md | head -100
# Check if there's any helm/kubernetes specific deployment documentation
find docs -name "*.md" -type f | xargs grep -l "hostPath" -iRepository: y-scope/clp
Length of output: 4012
🏁 Script executed:
#!/bin/bash
# Check for Helm-specific deployment documentation
find docs -name "*.md" -type f | xargs grep -l "helm\|values.yaml" -i
# Check Chart.yaml for documentation references
cat tools/deployment/package-helm/Chart.yaml
# Search for any kubernetes prerequisites or setup documentation
rg "kubernetes|k8s|helm" docs/src/user-docs -i -lRepository: y-scope/clp
Length of output: 662
🏁 Script executed:
#!/bin/bash
# Check for helm hooks or init scripts that might create the aws_config_directory
rg "initContainers\|initContainer" tools/deployment/package-helm/templates -A5
# Search for any helm hooks
rg "hook\|init" tools/deployment/package-helm/templates -i
# Check the full values.yaml for aws_config_directory context
grep -A10 -B5 "aws_config_directory" tools/deployment/package-helm/values.yamlRepository: y-scope/clp
Length of output: 496
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation about Kubernetes deployment or prerequisites
find docs -name "*.md" -type f | xargs grep -l "prerequisites\|deployment\|install" -i | head -10
# Check the main README for kubernetes/helm instructions
cat README.md | head -100Repository: y-scope/clp
Length of output: 6051
Document the hostPath directory prerequisite for AWS config in deployment guides.
When aws_config_directory is set to a non-null value, Kubernetes requires the directory to exist on all target nodes before pod scheduling. The Helm chart uses hostPath volumes with type: "Directory", which will fail to mount if the path doesn't exist. Add a note to the deployment documentation and/or values.yaml comments stating that users must ensure this directory exists on all worker nodes (or is created as part of pre-deployment setup).
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/_helpers.tpl around lines 268-273,
The Helm template define "clp.awsConfigVolume" uses a hostPath with type
"Directory" sourced from .Values.clpConfig.aws_config_directory; update the
deployment docs and add a comment in values.yaml next to
clpConfig.aws_config_directory to explicitly state that when this value is set,
the specified directory must already exist on every target node (or be created
during pre-deployment) because Kubernetes will fail to mount a hostPath
Directory that does not exist. Ensure the note mentions potential alternatives
(e.g., using a different volume type) and the exact values key
(.Values.clpConfig.aws_config_directory) so users can find and configure it
easily.
| {{- if eq .Values.clpConfig.stream_output.storage.type "s3" }} | ||
| {{- with .Values.clpConfig.stream_output.storage.s3_config }} | ||
| "StreamFilesS3Region": {{ .region_code | default "null" | quote }}, | ||
| "StreamFilesS3PathPrefix": {{ printf "%s/%s" .bucket .key_prefix | quote }}, | ||
| {{- if eq .aws_authentication.type "profile" }} | ||
| "StreamFilesS3Profile": {{ .aws_authentication.profile | quote }}, | ||
| {{- else }} | ||
| "StreamFilesS3Profile": null, | ||
| {{- end }} | ||
| {{- end }}{{/* with .Values.clpConfig.stream_output.storage.s3_config */}} |
There was a problem hiding this comment.
Potential nil pointer dereference when accessing aws_authentication.type.
Line 290 accesses .aws_authentication.type without first checking if aws_authentication exists. If aws_authentication is not defined in the values, this will cause a template rendering error.
🔎 Proposed fix to add nil check
{{- with .Values.clpConfig.stream_output.storage.s3_config }}
"StreamFilesS3Region": {{ .region_code | default "null" | quote }},
"StreamFilesS3PathPrefix": {{ printf "%s/%s" .bucket .key_prefix | quote }},
- {{- if eq .aws_authentication.type "profile" }}
+ {{- if and .aws_authentication (eq .aws_authentication.type "profile") }}
"StreamFilesS3Profile": {{ .aws_authentication.profile | quote }},
{{- else }}
"StreamFilesS3Profile": null,
{{- end }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- if eq .Values.clpConfig.stream_output.storage.type "s3" }} | |
| {{- with .Values.clpConfig.stream_output.storage.s3_config }} | |
| "StreamFilesS3Region": {{ .region_code | default "null" | quote }}, | |
| "StreamFilesS3PathPrefix": {{ printf "%s/%s" .bucket .key_prefix | quote }}, | |
| {{- if eq .aws_authentication.type "profile" }} | |
| "StreamFilesS3Profile": {{ .aws_authentication.profile | quote }}, | |
| {{- else }} | |
| "StreamFilesS3Profile": null, | |
| {{- end }} | |
| {{- end }}{{/* with .Values.clpConfig.stream_output.storage.s3_config */}} | |
| {{- with .Values.clpConfig.stream_output.storage.s3_config }} | |
| "StreamFilesS3Region": {{ .region_code | default "null" | quote }}, | |
| "StreamFilesS3PathPrefix": {{ printf "%s/%s" .bucket .key_prefix | quote }}, | |
| {{- if and .aws_authentication (eq .aws_authentication.type "profile") }} | |
| "StreamFilesS3Profile": {{ .aws_authentication.profile | quote }}, | |
| {{- else }} | |
| "StreamFilesS3Profile": null, | |
| {{- end }} | |
| {{- end }}{{/* with .Values.clpConfig.stream_output.storage.s3_config */}} |
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/configmap.yaml around lines 286-295,
The template accesses .aws_authentication.type without ensuring
.aws_authentication exists; change the conditional to guard for nil, e.g. use an
AND: replace {{- if eq .aws_authentication.type "profile" }} with {{- if and
.aws_authentication (eq .aws_authentication.type "profile") }} so
StreamFilesS3Profile is only read when aws_authentication is present, keeping
the else branch that emits null.
| {{ .Values.clpConfig.stream_output.target_uncompressed_size | int }}, | ||
| {{- if eq .Values.clpConfig.stream_output.storage.type "s3" }} | ||
| {{- with .Values.clpConfig.stream_output.storage.s3_config }} | ||
| "StreamFilesS3Region": {{ .region_code | default "null" | quote }}, |
There was a problem hiding this comment.
Consider using unquoted null for proper JSON null value.
{{ .region_code | default "null" | quote }} will produce "null" (string) instead of null (JSON null). If the intent is to have a proper JSON null when region_code is not set, this needs adjustment.
🔎 Proposed fix for proper null handling
- "StreamFilesS3Region": {{ .region_code | default "null" | quote }},
+ {{- if .region_code }}
+ "StreamFilesS3Region": {{ .region_code | quote }},
+ {{- else }}
+ "StreamFilesS3Region": null,
+ {{- end }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "StreamFilesS3Region": {{ .region_code | default "null" | quote }}, | |
| {{- if .region_code }} | |
| "StreamFilesS3Region": {{ .region_code | quote }}, | |
| {{- else }} | |
| "StreamFilesS3Region": null, | |
| {{- end }} |
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/configmap.yaml around line 288, The
template currently renders "StreamFilesS3Region" as a JSON string "null" because
{{ .region_code | default "null" | quote }} wraps the default in quotes; change
the template to emit a real JSON null when .region_code is unset by using a
conditional: replace the expression with {{ if .region_code }}{{ .region_code |
quote }}{{ else }}null{{ end }} so "StreamFilesS3Region" gets either a quoted
region string or an unquoted null.
| - name: "tmp" | ||
| mountPath: "/var/log" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "name: \"tmp\"" tools/deployment/package-helm/templates/ -A 3 -B 1Repository: y-scope/clp
Length of output: 1270
🏁 Script executed:
cat tools/deployment/package-helm/templates/garbage-collector-deployment.yaml | head -100Repository: y-scope/clp
Length of output: 3845
🏁 Script executed:
fd -t f "compression-worker" tools/deployment/package-helm/templates/Repository: y-scope/clp
Length of output: 584
🏁 Script executed:
rg -l "compression-worker" tools/deployment/package-helm/templates/ -t yamlRepository: y-scope/clp
Length of output: 584
🏁 Script executed:
rg -n "mountPath.*tmp" tools/deployment/package-helm/templates/ -B 2Repository: y-scope/clp
Length of output: 380
🏁 Script executed:
rg -n "name: \"tmp\"" tools/deployment/package-helm/templates/ -A 2 -B 2Repository: y-scope/clp
Length of output: 1234
🏁 Script executed:
cat tools/deployment/package-helm/templates/compression-worker-deployment.yaml | grep -A 10 -B 10 "name: \"tmp\"" | head -40Repository: y-scope/clp
Length of output: 37
🏁 Script executed:
# Check the full garbage-collector deployment to understand the context
sed -n '1,120p' tools/deployment/package-helm/templates/garbage-collector-deployment.yaml | tail -60Repository: y-scope/clp
Length of output: 2427
🏁 Script executed:
# Look for any references to /var/log in the garbage-collector deployment
grep -n "var/log" tools/deployment/package-helm/templates/garbage-collector-deployment.yamlRepository: y-scope/clp
Length of output: 203
Change the tmp volume mount path from /var/log to /var/tmp.
The tmp volume is mounted at /var/log, which is inconsistent with how other components handle temporary volumes (compression-worker mounts tmp to /var/tmp). Additionally, mounting tmp to /var/log is non-standard and could create conflicts with the existing logs volume mount at /var/log/garbage_collector. This appears to be a copy-paste error.
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/garbage-collector-deployment.yaml
around lines 73-74, The tmp volume mount incorrectly uses mountPath "/var/log";
update the container volumeMount with name "tmp" to use mountPath "/var/tmp"
(matching other components like compression-worker) to avoid conflicts with the
existing logs mount at "/var/log/garbage_collector".
| {{- if eq .Values.clpConfig.stream_output.storage.type "s3" }} | ||
| {{- include "clp.createLocalPv" (dict | ||
| "root" . | ||
| "component_category" "query-worker" | ||
| "name" "staged-streams" | ||
| "nodeRole" "worker" | ||
| "capacity" "50Gi" | ||
| "accessModes" (list "ReadWriteOnce") | ||
| "hostPath" .Values.clpConfig.stream_output.storage.staging_directory | ||
| ) }} | ||
| {{- end }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for staging_directory references
echo "=== Searching for staging_directory references ==="
rg -n "staging_directory" tools/deployment/package-helm/ -A 3 -B 3
echo ""
echo "=== Searching for values schema files ==="
fd "values.schema.json" tools/deployment/package-helm/
echo ""
echo "=== Searching for values.yaml with clpConfig.stream_output ==="
fd "values.yaml" tools/deployment/package-helm/ --exec head -200 {}Repository: y-scope/clp
Length of output: 7520
🏁 Script executed:
#!/bin/bash
# Find and read the complete values.yaml
echo "=== Reading full values.yaml ==="
fd "values.yaml" tools/deployment/package-helm/ -x wc -l {}
echo ""
echo "=== Searching for staging_directory in entire values.yaml ==="
fd "values.yaml" tools/deployment/package-helm/ -x rg "staging_directory" {} -B 2 -A 2
echo ""
echo "=== Looking at stream_output section in values.yaml ==="
fd "values.yaml" tools/deployment/package-helm/ -x sed -n '/stream_output:/,/^ [a-z_]*:/p' {}
echo ""
echo "=== Looking at archive_output section in values.yaml ==="
fd "values.yaml" tools/deployment/package-helm/ -x sed -n '/archive_output:/,/^ [a-z_]*:/p' {}Repository: y-scope/clp
Length of output: 1797
🏁 Script executed:
#!/bin/bash
# Examine configmap.yaml storage sections
echo "=== archive_output section in configmap.yaml ==="
rg -n "archive_output:" tools/deployment/package-helm/templates/configmap.yaml -A 30
echo ""
echo "=== stream_output section in configmap.yaml ==="
rg -n "stream_output:" tools/deployment/package-helm/templates/configmap.yaml -A 30Repository: y-scope/clp
Length of output: 2716
🏁 Script executed:
#!/bin/bash
# Check if there's any linking or if PV should use different approach
echo "=== Checking if staging_directory should be in values.yaml (searching git history pattern) ==="
fd "values.yaml" tools/deployment/package-helm/ -x head -n 171 {}
echo ""
echo "=== Checking createLocalPv helper to understand what it expects ==="
rg -n "createLocalPv" tools/deployment/package-helm/templates/ -B 2 -A 10
echo ""
echo "=== Looking for other PV creations to see if they reference staging_directory ==="
fd ".yaml" tools/deployment/package-helm/templates/ -x rg "createLocalPv"Repository: y-scope/clp
Length of output: 27443
🏁 Script executed:
#!/bin/bash
# Verify the exact impact by checking if staging_directory is referenced elsewhere
echo "=== All references to staging_directory in codebase ==="
rg "staging_directory" tools/deployment/package-helm/ -n
echo ""
echo "=== Verify this affects both stream and archive ==="
rg "query-worker-staged-streams-pv\|compression-worker-staged-archives-pv" tools/deployment/package-helm/ -lRepository: y-scope/clp
Length of output: 695
Add missing staging_directory fields to values.yaml for S3 storage mode.
The PV templates reference .Values.clpConfig.stream_output.storage.staging_directory and .Values.clpConfig.archive_output.storage.staging_directory, but these fields are not defined in values.yaml. When storage.type is set to "s3", Helm renders these as null, producing invalid PersistentVolume manifests that prevent pod scheduling.
Add the following to values.yaml:
clpConfig.stream_output.storage.staging_directory: "/var/data/staged-streams"clpConfig.archive_output.storage.staging_directory: "/var/data/staged-archives"
Alternatively, expose these as configurable values to allow customization per deployment.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[error] 10-10: too many spaces inside braces
(braces)
[error] 11-11: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/query-worker-staged-streams-pv.yaml
around lines 1-11, The values referenced by the PV templates
(.Values.clpConfig.stream_output.storage.staging_directory and
.Values.clpConfig.archive_output.storage.staging_directory) are missing from
values.yaml causing null paths and invalid PersistentVolume manifests; add these
two keys to values.yaml (or expose them as configurable entries) with sensible
defaults such as "/var/data/staged-streams" for
clpConfig.stream_output.storage.staging_directory and
"/var/data/staged-archives" for
clpConfig.archive_output.storage.staging_directory so the Helm templates that
call include "clp.createLocalPv" receive non-null staging_directory values and
produce valid PV specs.
| {{- with .Values.clpConfig.stream_output.storage }} | ||
| {{- if and (eq .type "s3") (eq .s3_config.aws_authentication.type "credentials") }} | ||
| - name: "AWS_ACCESS_KEY_ID" | ||
| value: {{ .s3_config.aws_authentication.credentials.access_key_id | quote }} | ||
| - name: "AWS_SECRET_ACCESS_KEY" | ||
| value: {{ .s3_config.aws_authentication.credentials.secret_access_key | quote }} | ||
| {{- end }}{{/* if and (eq .type "s3") | ||
| (eq .s3_config.aws_authentication.type "credentials") */}} | ||
| {{- end }}{{/* with .Values.clpConfig.stream_output.storage */}} |
There was a problem hiding this comment.
Critical: AWS credentials must not be stored in plain text in values.yaml.
The AWS credentials are being read directly from values.yaml and injected as environment variables. This is a critical security vulnerability—credentials should never be stored in plain text in Helm values or ConfigMaps.
Required fix: Store AWS credentials in a Kubernetes Secret and reference them using valueFrom.secretKeyRef, following the same pattern used for database credentials (lines 42-51).
🔒 Recommended fix using Kubernetes Secrets
- Create a Secret for AWS credentials (in a new file or existing secrets file):
apiVersion: v1
kind: Secret
metadata:
name: {{ include "clp.fullname" . }}-aws-credentials
type: Opaque
data:
access-key-id: {{ .Values.credentials.aws.access_key_id | b64enc | quote }}
secret-access-key: {{ .Values.credentials.aws.secret_access_key | b64enc | quote }}- Update the deployment to reference the Secret:
{{- with .Values.clpConfig.stream_output.storage }}
{{- if and (eq .type "s3") (eq .s3_config.aws_authentication.type "credentials") }}
- name: "AWS_ACCESS_KEY_ID"
- value: {{ .s3_config.aws_authentication.credentials.access_key_id | quote }}
+ valueFrom:
+ secretKeyRef:
+ name: {{ include "clp.fullname" $ }}-aws-credentials
+ key: "access-key-id"
- name: "AWS_SECRET_ACCESS_KEY"
- value: {{ .s3_config.aws_authentication.credentials.secret_access_key | quote }}
+ valueFrom:
+ secretKeyRef:
+ name: {{ include "clp.fullname" $ }}-aws-credentials
+ key: "secret-access-key"
{{- end }}
{{- end }}- Move credentials to a dedicated section in
values.yaml:
credentials:
aws:
access_key_id: "" # Set via --set or secure values file
secret_access_key: ""Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @tools/deployment/package-helm/templates/webui-deployment.yaml around lines
62-70, The template injects AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY directly
from .Values.clpConfig.stream_output.storage, which exposes credentials in plain
text; instead, move AWS creds into a Kubernetes Secret (e.g., a secret named via
the same naming helper used elsewhere like include "clp.fullname") and change
the deployment env entries for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to
use valueFrom.secretKeyRef referencing that Secret (follow the same pattern used
for the DB credentials block and reference secret keys such as access-key-id and
secret-access-key); remove the direct usage of
.s3_config.aws_authentication.credentials.* in the deployment template and
ensure values.yaml contains a credentials.aws section intended only for secure
injection (or require users to supply the Secret externally).
gibber9809
left a comment
There was a problem hiding this comment.
Did a first pass with a bunch of questions/comments, but overall the PR looks good. As far as I can tell some of the rabbit comments seem worth addressing as well.
| "component_category" "compression-worker" | ||
| "name" "staged-archives" | ||
| "nodeRole" "worker" | ||
| "capacity" "50Gi" |
There was a problem hiding this comment.
Not a huge deal, but this seems a bit high for staging.
Maybe something more like 10Gi (to match up with what we seem to provision for tmp space) might be more suitable, or 20-25Gi if we want a bit more headroom?
There was a problem hiding this comment.
let's use 20Gi for both the staged archives and streams for now
| "root" . | ||
| "component_category" "compression-worker" | ||
| "name" "staged-archives" | ||
| "capacity" "50Gi" |
| {{- if .credentials }} | ||
| credentials: | ||
| access_key_id: {{ .credentials.access_key_id | quote }} | ||
| secret_access_key: {{ .credentials.secret_access_key | quote }} | ||
| {{- if .credentials.session_token }} | ||
| session_token: {{ .credentials.session_token | quote }} | ||
| {{- end }} | ||
| {{- end }} |
There was a problem hiding this comment.
My understanding is that credentials and other privileged data should be put in a "Secret" instead of a "ConfigMap" for Kubernetes. I guess this is pretty much the same way we currently treat credentials access, so it's probably fine for now, but it might be worth changing in the future. If we don't have an issue tracking this, it might be worth creating one.
There was a problem hiding this comment.
Right, the plan is to isolate credentials from other clp-config fields so that their lifecycle can be managed more securely. Once this separation is in place, moving those values into Kubernetes Secrets would be more meaningful and appropriate
@coderabbitai in case we haven't created an issue for such refactoring, create one to track
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for existing issues related to credentials, secrets, or configmap refactoring
gh issue list --repo y-scope/clp --state all --search "credentials secret configmap helm" --json number,title,state,url --limit 20
Length of output: 2
🏁 Script executed:
#!/bin/bash
# Create a new issue for tracking the credentials/secrets refactoring
gh issue create \
--repo y-scope/clp \
--title "Refactor Helm chart to use Kubernetes Secrets for credentials instead of ConfigMaps" \
--body "## Background
Currently, AWS credentials and other sensitive authentication data are stored in Kubernetes ConfigMaps within the Helm chart deployment. For better security practices, these should be stored in Kubernetes Secrets instead.
## Proposed approach
1. Isolate credentials from other clp-config fields so their lifecycle can be managed separately
2. Once isolated, move credential values into Kubernetes Secrets for more secure handling
## References
- PR #1825: https://github.com/y-scope/clp/pull/1825
- Discussion: https://github.com/y-scope/clp/pull/1825#discussion_r2665630820
- Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1844 to track the refactoring of credentials from ConfigMaps to Kubernetes Secrets. The issue includes the proposed approach and references this PR and discussion thread.
🐰📝
🧠 Learnings used
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
| {{- if eq .Values.clpConfig.logs_input.type "fs" }} | ||
| "LogsInputRootDir": "/mnt/logs", | ||
| {{- else }} | ||
| "LogsInputRootDir": null, |
There was a problem hiding this comment.
Why does LogsInputRootDir and StreamFilesDir, StreamFilesS3Region , etc.. Need to be explicitly null when not in use?
There was a problem hiding this comment.
In JavaScript/TypeScript, null and undefined are distinct values with different types. Consumers of settings.json (the web UI code) expect each value to be either null or a string. While we could also tolerate undefined when a field is missing, this would require additional handling and stricter typings
| "component_category" "query-worker" | ||
| "name" "staged-streams" | ||
| "nodeRole" "worker" | ||
| "capacity" "50Gi" |
There was a problem hiding this comment.
Pretty much same as my other comment. Arguably we might need more space for staged streams compared to staged archives, but maybe 25Gi could be enough as a default?
| "root" . | ||
| "component_category" "query-worker" | ||
| "name" "staged-streams" | ||
| "capacity" "50Gi" |
| - name: "AWS_ACCESS_KEY_ID" | ||
| value: {{ .s3_config.aws_authentication.credentials.access_key_id | quote }} | ||
| - name: "AWS_SECRET_ACCESS_KEY" | ||
| value: {{ .s3_config.aws_authentication.credentials.secret_access_key | quote }} |
There was a problem hiding this comment.
Does this not support session token?
There was a problem hiding this comment.
right, such support is missing in the wbeui. that said, officially we only support long-term credentials in the config interfaces: https://docs.yscope.com/clp/main/user-docs/guides-using-object-storage/clp-config.html#credentials
There was a problem hiding this comment.
i understand that writing this configmap.yaml might be confusing:
{{- if .credentials.session_token }}
session_token: {{ .credentials.session_token | quote }}
{{- end }}
the rationale is that in case people really only have temp credentials and want to use them, they still can do so with the supported services.
if you think that's too confusing, i can remove
There was a problem hiding this comment.
No, I think it makes sense to keep for our own testing purposes. Since we don't document it for end users I don't think it should cause any issues.
gibber9809
left a comment
There was a problem hiding this comment.
Everything looks good to me now.
Maybe for the PR title:
feat(helm): Add S3 storage support for archives, streams, and log inputs.
Description
Note
This PR is part of the ongoing work for #1309. More PRs will be submitted until the Helm chart is complete and fully functional.
This PR adds S3/AWS storage support to the CLP Helm chart, enabling archives and stream outputs to be stored in S3 instead of local filesystem storage. The changes include:
New Configuration Options:
aws_config_directorytovalues.yamlfor specifying the path to AWS config files (e.g.,~/.aws)type: "s3"inarchive_output.storageandstream_output.storagetype: "s3"inlogs_inputconfigurationNew Resources:
compression-worker-staged-archives-pv.yamlandcompression-worker-staged-archives-pvc.yamlfor staging archives before S3 uploadquery-worker-staged-streams-pv.yamlandquery-worker-staged-streams-pvc.yamlfor staging streams before S3 uploadHelper Templates:
clp.awsConfigVolumeMountandclp.awsConfigVolumehelpers for mounting AWS configurationDeployment Changes:
logsInputVolumemounts conditional onlogs_input.type == "fs"shared-data-archivesmounts conditional onarchive_output.storage.type == "fs"shared-data-streamsmounts conditional onstream_output.storage.type == "fs"aws_config_directoryvolume mounts (conditional) to compression-scheduler, compression-worker, query-worker, garbage-collector, and webui deploymentsarchive_output.storage.type == "s3")stream_output.storage.type == "s3")ConfigMap Updates:
archive_output.storage,stream_output.storage, andlogs_inputaws_config_directoryfield to clp-configwebui-client-settings.jsonandwebui-server-settings.jsonto support S3 stream filesChecklist
breaking change.
Validation performed
0. Configure S3 input and outputs in values.yaml:
(Note:
aws_authentication.type = profiledoes not work with the API server and therefore is untested.)1. Helm chart deployment
2. Configured
clp-package/etc/clp-config.yaml3. Log ingestion
4. Accessed the WebuI to confirm the logs were ingested
5. Performed a query in the WebUI with query string "1" and observed results were returned:
6. Clicked the "Original file" link which redirected to the log viewer with the logs' context restored:
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.